AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal fusion

# Multimodal fusion

Wan2.1 T2V 14B FusionX GGUF
Apache-2.0
This is a quantized text-to-video model that converts the base model to the GGUF format and can be used in ComfyUI, providing more options for text-to-video generation.
Text-to-Video English
W
QuantStack
563
2
Wan2.1 14B T2V FusionX FP8 GGUF
Apache-2.0
This is a GGUF conversion version based on the vrgamedevgirl84/Wan14BT2VFusionX model, mainly used for text-to-video generation tasks.
Text-to-Video
W
lym00
490
4
Lilt Infoxlm Base
MIT
LiLT-InfoXLM is a language-agnostic layout transformer model, created by combining the pre-trained InfoXLM with a language-independent layout transformer (LiLT), suitable for structured document understanding tasks.
Multimodal Fusion Transformers
L
SCUT-DLVCLab
110
5
Macbert Ngram Miao
A large language model based on Transformer architecture, supporting various natural language processing tasks
Large Language Model
M
miaomiaomiao
22
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase